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PURPOSE: To omit the exclusive control between jobs by providing a control 
job for replacement of control files, subjecting queue control to the request 
issued from each job, and executing serially the processing updating a system 
control file of each request in terms of time. 

CONSTITUTION: A control job 14 accepts the access requests supplied to a system 
control file 16 from the communication requester jobs 11-13 and connects 
these access requests to an access request queue 15 to perform successively 
the access processes of the file 16 at and after the head one. Then the job 
14 carries out the replacement processing for acquisition of resources at the 
start of communication and for release of resources at the end of communication 
with all control blocks (shown by oblique lines) related to a single communica- 
tion requester job that is taken out of the queue 15 in a control block 17 of 
the file 16. Then the part 14 takes the next communication requester job out 
of the queue and executes it. 



7 P 673 




a; request, b: access prc^cessins 




CO 
CQ 




A section from the book entitled "'Distributed Operating System — what comes after 
UNIX", pg. 169-176, written by Mamoru Maekawa et. ai.; date published: December 
25, 1991; published by Kyoritsu Shuppan Kabushiki Kaisha. 



9.4 Control of Simultaneous Execution 
9.4.1 Common Clock 

The basic problems of simultaneous execution control in a system lies 
in deciding and managing an order of phenomenon occurrences in the system. 
To solve these problems, one can consider to use a node which performs 
such management of the phenomenon occurrences (also called concentrated 
management) . A lot of problems distinctive in a distributed system can 
be avoided by using the concentrated management method. However, this 
method has problems in aspects of efficiency and reliability. As regard 
to the distributed system, it is desirable to distribute a mechanism 
as well in implementing the simultaneous execution control management. 

In the distributed system, a process request is generally transmitted 
in a message. A time taken in transmitting and receiving the message 
is variable. For example, when two processes A and B transmit messages 
to a process P in an order of A and B, the process P may receive the 
messages in an order of B and A, and so-called a revered order of arriving 
messages may occur. Under such circumstance, it is necessary for the 
processes A and B to have common clocks, so that the transmission times 
can be entered to their respective messages. 

Rules for deciding the order of phenomenon occurrences in a system 
is as follows : 

(1) If a phenomena a is executed before a phenomena Jb inside one process, 
then a^b; and 

(2) If the processes are communicating with one another, provided that 
transmission is a and reception is Jb, then a-^Jb. 

In order to create the order relationship that meets this clock (i.e. 
if a— *jb then an occurrence time a < an occurrence time h) , the f ollowings 
are the case: 

(1) each process (actually each node) has a counter, and the counter 
is increased every time a new phenomena occurs; and 

(2) in cases of exchanging a message among a process 1 and a process 
2, a value of the counter of the process 1 is attached to the message 
as a timestamp, and when a value of the timestamp is greater than 
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a value of counter of the process 2 that has received the message, 
then update the value of counter of the process 2 to be greater than 
the value of the timestamp . 
If the values of timestamps for the phenomenon happen to be identical, 
then a process number (node number) can be used instead to assign the 
order of phenomenon occurrences . Virtually, the value of counter close 
to an actual time is adopted. In order to do so, each node in the system 
should have a sufficiently accurate clock as a counter, and when a message 
arrives at each node, the clock of each node itself is re-adjusted based 
on the transmission time of the arrived message and a minimum estimation 
of delayed time of the arrived message. According to the result by 
Lamport [Lamport 78] , all the clocks inside the system can be adjusted 
at an error below d (2k z + ^) provided that an error of the clock is 
k, a diameter of the network is d (set of nodes that cannot communicate 
without passing through a vicinity of d nodes) , a maximum interval of 
message transmission is r , and a message delayed time is which cannot 
be predicted. 

9.4.2 Problems in exclusive control 

When the exclusive control in the distributed system is performed 
by the concentrated management method, only two messages should be 
sufficient in order to perform a single exclusive control, namely: a 
request message being transmitted from a request process to a management 
process; and a response message of the request message. However, the 
concentration management method has problems as described in the 
previous section, and it is also ideal to distribute the exclusive 
control management. The distributed management method requires the 
following properties: 

(1) each node has the same amount of management information; 

(2) each node performs decision using an identical algorithm; 

(3) each node allots loads equally; 

(4) cost involved in initializing the control from each node is all same; 
and 

(5) a fault in a single node does not lead to a termination of the whole 
system . 

Also, assumption and characteristics common to the algorithms which 
will be described below are that: 
(1) a single process is contained in each node; 
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(2) messages are processed in the order of their transmission times; 

(3) messages are correctly received within a finite time; and 

(4) a network is fully closed. 

As a basic solution to the problems, there is an algorithm of Lamport 
[Lamport 78] . This algorithm adopts the previously described 
time stamp . 

(1) a process 1 requesting for a resource enters a request into a queue 
and transmits a REQUEST message to all other processes; 

(2) a process which has received the REQUEST message enters the request 
of the process 1 into the queue and returns a REPLY message to the 
process 1; 

(3) when the request of the process 1 is leading in the queue, and when 
the REPLY messages are received from all other processes that are 
newer than the request of the process 1, then the request of the 
process 1 can access the resource; 

(4) when the process 1 is releasing the resource, the process 1 removes 
its request from the queue, and transmits a RELEASE message to all 
other processes; and 

(5) a process which has received the RELEASE message removes the request 
of the process 1 from the queue. 

Using this method, 3X{N-1) messages are required for a single 
exclusive control. N indicates a number of all the processes. 

As opposed to this method, according to the Ricart and Agrawala' s 
algorithm [Ricart 81] : 

(1) a process 1 requesting for a resource transmits a REQUEST message 
to all other processes; 

(2) a process which has received the REQUEST message returns a REPLY 
^message only if the required REQUEST message of the process 1 is 

older than a request of the process or if the process is not requesting 
at all. The REPLY message is not transmitted if the REQUEST message 
of the process 1. is not older than the request of the process; 

(3) the process 1 which has transmitted the request can access the 
resource when the REPLY messages from all other processes are 
received; and 

(4) when the process 1 is releasing the resource, if there are REPLY 
messages not being transmitted at (2), they are transmitted. 

Under this method, 2X(N-1) messages are required for a single 
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exclusive control. 

In order to access the resource from each process using the two methods 
described above, a permission of ''all consistent" from all other 
processes must be obtained- By adopting a majority voting method of 
Thomas' , this condition of the "all consistent" is loosened so that the 
permissions need be obtained from only more than half of the process. 
By adopting the Thomas' method, the numbers of processes and messages 
required for a single exclusive control can be halved- By adopting such 
voting method having a weight on it, only more than half of votes held 
by all the processes has to be obtained, and not more than half of the 
processes . 

Maekawa's algorithm only requires a much lesser number of messages 
[Maekawa 8 5] . 

Let's suppose that groups of processes including processes i and j 
are written as Si and Sj, respectively. The following conditions are 
required in order for the processes i and j (l^i, j^N) to access the 
resource. Their request messages are transmitted and their respective 
reply messages to return to the other group, respectively. 

(1) Vi j Si n Sj 9^ 0 

(2) I Si I - I S2 I =•••= I Sn I =K (constant) 

(3) Vi I {j I Sj3i} I =D (constant) 
Further, provided that 

(4) Vi i3si 

then an actual number of messages transmitted can be reduced. Under 
these condition, K can be set approximately to V~N. This means that 
Maekawa' s algorithm must communicate with at least /*N processes for 
implementing the exclusive control. When calculate Si to meet the 
conditions (1) to (4) , then the exclusive control is implemented at 0(/~ 
N) messages. Refer to a paper by Maekawa 85 on how to calculate Si. 
The exclusive control is performed in the following order. 

(1) a process i requesting for a resource transmits a REQUEST message 
to all the members of Si. 

(2) Processes of Si that have received the REQUEST message of the 
process i is locked by i if not yet locked by any other processes, 
and the locked process returns a LOCKED message to i . If a process 
was already being locked by other process, then the REQUEST message 
of i enters a queue, however, if either one of a REQUEST message of 
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currently locked process or the other REQUEST messages that may be 
present in the queue is older than the REQUEST message of i, then 
a FAILED message is returned to i. Otherwise, an INQUIRE message is 
sent to a process currently locking this process, and inquires whether 
the process has obtained all the necessary locks . 

(3) The process that has received the INQUIRE message returns a 
RELIQUISH message, provided that the process has already received 
the FAILED message. 

(4) A process that has received the RELIQUISH message is released 
from a current lock, and the REQUEST message of 1 re-enters the queue. 
Next, this process is locked by the oldest REQUEST message in the 
queue, and a LOCKED message is returned to the process that sent the 
oldest REQUEST message. 

(5) The process i can access the resource when it has received the 
LOCKED messages from all members of Si 

(6) When the process 1 is releasing the resource, send a RELEASE 
message to all members of Si. 

(7) a process that has received the RELEASE message is released from 
the current lock, and is locked by the most oldest REQUEST message 
in the queue. A LOCKED message is returned to the process that sent 
the oldest REQUEST message. 

Using this method, c/~ N messages (where C=3 to 5) are required for 
a single exclusive control. 

As regard to the previously described algorithms, none the processes 
has an exclusive rights (tokens) to access the resources- An algorithm 
having the least number of messages is created by loosening the 
conditions and by exchanging the tokens to access the resources. For 
example, there are algorithm having N messages required for exclusive 
control [Suzuki 8 5] and algorithm having^og N messages for communication 
between the processes by configuring a tree-like topology [Trehel 86] . 

9.4,3 Concurrency control 

When a plurality of transactions are executed concurrently, there 
are possibility that the read and write operations defined in the 
transactions are executed in various orders. Especially, when reading 
and writing the same data item in the same file, a result of the concurrent 
execution cannot be predicted. In order to prevent this from happening. 
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even though a plurality of transactions are executed concurrently, the 
transactions must be controlled in some way so that the result of 
concurrent execution will be same as a result of successive execution. 
Such management of executing transactions is known as a serialization, 
and its control method is known as a concurrency control. Examples of 
the result of concurrency control and the result of no concurrency 
control are illustrated in Fig. 9. 7. 

The concurrency method is largely classified into three types. 

(1) method based on lock 

(2) method based on timestamp 

(3) optimistic method 

These methods are described below. 

When a transaction 1 is executing a series of operations to a certain 
resource, an inconsistency will arise if another transaction executes 
the same resource while being executed by the transaction 1, because 
this resource is in an inconsistent state. To avoid the transactions 
sharing the resource under the inconsistent state, a mutual exclusion 
is performed when the transaction accesses the resource in order to get 
around such inconvenience. 

If all the transactions are well-formed, and are two-phase, they can 
be serialized [Eswaran 76] . For the transactions to be well-formed, 
the following three properties must be met by the transactions: 

(1) a resource is locked before accessing it 

(2) a resource that is already being locked will not be locked again 

(3) all locks are unlocked before completing the process 

For the transactions to be 2-phase, once the resource is unlocked 
during the transaction, there will be no more locking of the resource 
after that. In such cases, a process is divided into two phases of lock 
and unlock as illustrated in Fig . 9 . 8 . Such locking method is known as 
two-phase locking . The method based on lock is known to be more effective 
than other methods, and it is quite commonly being utilized. However, 
there is a possibility of dead lock occurring, which must be detected 
and avoided. The dead lock will be described in the next section. 

Practically, the question of vital importance in a lock is a 
granularity of the lock. For example, let's consider locking a file, 
then if a single file is set as a concurrent access unit, then the 
concurrency will be decreased prominently. Locking of a logic record 
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of the file is more common. 

Also, in order to increase the concurrency, there are cases of inducing 
two types of lock for read and write. We refer to a lock for reading 
as a read lock, and a lock for writing as a write lock. A transaction 
that only performs reading to a particular data item will not compete 
with another transaction that only performs reading to the same data 
item. In such cases, a plurality of transactions shares the read lock. 
However, access by another transaction is prohibited for those data that 
are locked by the write lock. Table 9.2 illustrates the lock rule. 

Read is more frequently performed compared to write, therefore, this 
method seems to give an improved effect on the concurrency, even though 
a transaction which performs read only will completely interfere a 
progress of transaction that accompanies write. Gifford is proposing 
an induction of intention-to-write lock" [Gifford 79] . Using this 
method, a commit lock is used instead of the write lock. Table 9.3 
illustrates the lock rule of '^intention-to-write lock". When the write 
is being performed, a data is locked by the intention-to-write lock, 
and further, when the transaction performs commit, the intentiori- 
to-write lock is changed to the commit lock. Under a locked condition 
by the intention-to-write lock, the read is being executed concurrently . 

With regard to the method based on the timestamp, the timestamps are 
provided to all the transactions beforehand so that this method can 
serialize the transactions by only permitting access in an order of the 
timestamps [Bernstein 81] when requests for accessing to the resource 
are competing- Its fundamental objectives are to make a write valid 
only after performing read and write of the previous transaction, and 
to make a read valid only after performing write of the previous 
transaction. 

For the method based on the timestamps, the timestamp is assigned 
to the transaction at a time when the timestamp is being created. This 
timestamp is always attached to an access request of the data item. Also, 
to each data item, a timestamp Tw of a transaction which performed the 
write last, and a timestamp Tr of a transaction which performed the read 
last are recorded . Figure 9 . 9 illustrates a state of concurrency control 
based on the timestamps. The read and write are processed as follows. 

(1) Suppose that a request to read the timestamp T is made, T<Tw, 
if a transaction newer than a transaction which has already 
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performed write, the transaction which requested it is aborted. 
If T^Tw, then the read is performed, and Tr is updated to max (Tr, 
T) . 

(2) Suppose that a request to write the timestamp T is made, T<max(Tr, 
Tw) , if a transaction newer than a transaction has already performed 
either the read or write, the transaction which requested it is 
aborted. If T^max(Tr, Tw) , then the write is performed, and Tw 
is updated to max (Tw, T) 
When the transaction is aborted, all the updating performed by the 
transaction are deleted, and an execution is re-attempted by providing 
a new timestamp to the transaction. This is known as a callback. When 
the callback frequently occurs, no doubt that an efficiency declines 
prominently - 

The write rule of Thomas attempts to improve the efficiency by 
performing the followings to the write (item 2 above) . When a write 
request of the timestamp T is small, Tw<T<Tr, or in other words, a 
transaction 1 newer than a transaction 2 has already read a result which 
is older than what the transaction 1 is trying to re-write, the 
transaction 1 that performed the write request is aborted. Nothing 
happens if T<Tw. For the other cases, write is performed, and Tw is 
updated to max (Tw, T) . This method decreases a possibility of aborting 
by competition between writes by ignoring those write requests which 
are erased due to repeated writing before any one of the transactions 
reads it. 

Further, a method known as a multi-vision is used to avoid abort at 
reading. In this method, write is not something that re-writes an 
original value, but it has a meaning to record a set of the timestamp 
Tw for the write transaction and a content of the writing V (Tw, V) as 
a new version. 

(1) Suppose that a read request of the timestamp T is made, then a 
content of the most recent version from versions having the 
timestamp Tw smaller than T is read out. 

(2) Suppose that a write request of the timestamp T is made, if a 
transaction having a large timestamp has already read the most 
recent version T>Tw, this write is rejected, and the transaction 
is aborted. For the cases other than this, a new version is 
created . 
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Conservative timestamp ordering is a method which does not need to 
reactivate the transaction by aborting. This method first of all 
collects the requests from all the transactions, then processes the most 
oldest request (i.e. request having the minimum timestamp). This 
prevents a later arrival of a request from a transaction, that might 
be carrying an older timestamp than a timestamp of the executed request. 
Therefore, aborting of the transaction does not occur. However, this 
method has a problem that a non-competing request will also be scheduled 
for execution according to the order of timestamps. To solve this 
problem, one can analyze the competing transactions beforehand and apply 
this method after having classified the competing transactions. 

To solve the problems of the method based on the lock and the method 
based on the timestamp, Kung and Robinson have proposed the optimistic 
method [Kung 81] . This is an idea that for the two processes being 
executed among transactions, if an access competition for the same data 
item are known as not occurring frequently, then there is no need to 
look up the competition for every read or write. Let the transaction 
to execute concurrently, and check to see whether the competition has 
actually occurred or not by analyzing the group of data items where the 
read and write is performed after completing the transaction. If the 
access competition is occurring, then the transaction is aborted. A 
copy of the data is stored for every transactions , and an actual operation 
is performed on that copy. This operation cannot be observed from other 
transactions . 

9.4.4 Solut:±on -to dead lock 

This section describes a method of solving the dead lock which is 
a problem of concurrency control based particularly on a lock. 

In general, there are two measures against the dead lock, namely: 
a prevention and a detection of the dead lock. The most simple method 
of preventing the dead lock is to announce all the resources that will 
be used by a transaction, and lock them all up upon commencing the 
transaction- The method is ideal when occupying time is short, however, 
a rate of resource usage may change for the worse, and may have a danger 
of inviting a starved condition. There is also a method of assigning 
the order according to the types of resources, although a decline in 
the rate of resource usage is unavoidable. 

So-called ^'wait-for graph" is commonly used in detecting the dead 
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lock. This is a graph created by defining a direction indicator from 
a transaction B to a transaction A when the transaction B is waiting 
for the resources locked by the transaction A to unlock them. A fact 
that the closed path of the direction indicator is present in this graph 
becomes an essence to the dead lock occurrence. After detecting the 
dead lock, the graph must decide which transaction should be aborted. 
There are the following methods in detecting the closed path- 

(1) when each transaction attempts to access the resource, the dead 
lock detection message is transmitted in accordance to the direction 
indicator coming from the transaction, and check to see if this message 
will return to the original transaction. 

(2) each transaction stores all the transactions that can be reached 
along the direction indicator (arriving possible group) . When the 
transaction A attempts to access the resource locked by the 
transaction B, the transaction A confirms whether or not the 
transaction A is stored in the arriving possible group stored by the 
transaction B. 

The method of (1) is not efficient since the transaction must be 
checked by successively transmitting the messages. The method of (2) 
has a problem that changing of the arriving possible group at unlocking 
time is not easy. 

As another measure against the dead lock, there is a method of using 
a lock timeout. This is the method of attaching a time limit to the 
lock, and releasing the lock when the access requests from the others 
are received. This method is commonly used in a file server. The 
problem with this method is the difficulty in setting the time limit 
because the execution time of transaction changes depending on the load. 
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Figure 9.7 Examples of Concurrency Control 
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Figure 9.8 



Two-phase locking 



(b) Concurrency control 
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Read lock and Write lock 
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Table 9.3 Lock rule inducing intention-to-write lock 
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Figure 9,9 Timestamp method 
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